Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Deduplication algorithm based on Winnowing fingerprint matching
WANG Qingsong, GE Hui
Journal of Computer Applications    2018, 38 (3): 677-681.   DOI: 10.11772/j.issn.1001-9081.2017082023
Abstract505)      PDF (974KB)(350)       Save
There are some problems in big data that the chunking size of the deduplication algorithm for Content-Defined Chunking (CDC) is difficult to control, the expense of fingerprint calculation and comparison is high, and the parameter needs to be set in advance. Thus, a Deduplication algorithm based on Winnowing Fingerprint Matching (DWFM) was proposed. Firstly, the chunking size prediction model was introduced before chunking, which can accurately calculate proper chunking size according to the application scenario. Then, the ASCⅡ/Unicode was used as the data block fingerprint in the calculation of the fingerprint. Finally, when determining the block boundary, the proposed algorithm based on chunk fingerprint matching does not need to set the parameters in advance to reduce fingerprint calculation and contrast overhead. The experimental results on a variety of datasets show that DWFM is about 10% higher than FSP (Fixed-Sized Partitioning) and CDC algorithms in deduplication rate, and about 18% in fingerprint computing and contrast overhead. As a result, the chunking size and boundaries of DWFM are more consistent with data characteristics, reducing the impact of parameter settings on the performance of deduplication algorithms, meanwhile, effectively eliminating more duplicate data when dealing with different types of data.
Reference | Related Articles | Metrics